Improved Induction Tree Training for Automatic Lexical Categorization
نویسندگان
چکیده
This paper studies a tuned version of an induction tree which is used for automatic detection of lexical word category. The database used to train the tree has several fields to describe Spanish words morpho-syntactically. All the processing is performed using only the information of the word and its actual sentence. It will be shown here that this kind of induction is good enough to perform the linguistic categorization. Index Term — Machine Learning, lexical categorization, morphology, syntactic analysis.
منابع مشابه
Integrating a Lexical Database and a Training Collection for Text Categorization
Automatic text categorization is a complex and useful task for many natural language processing applications. Recent approaches to text categorization focus more on algorithms than on resources involved in this operation. In contrast to this trend, we present an approach based on the integration of widely available resources as lexical databases and training collections to overcome current limi...
متن کاملThat's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets
We propose a novel data augmentation approach to enhance computational behavioral analysis using social media text. In particular, we collect a Twitter corpus of the descriptions of annoying behaviors using the #petpeeve hashtags. In the qualitative analysis, we study the language use in these tweets, with a special focus on the fine-grained categories and the geographic variation of the langua...
متن کاملSpanish/English Cross-Lingual Categorization
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the cas...
متن کاملCross-Lingual Text Categorization
This article deals with the problem of Cross-Lingual Text Categorization (CLTC), which arises when documents in different languages must be classified according to the same classification tree. We describe practical and cost-effective solutions for automatic Cross-Lingual Text Categorization, both in case a sufficient number of training examples is available for each new language and in the cas...
متن کاملUsing WordNet to Complement Training Information in Text Categorization
Automatic Text Categorization (TC) is a complex and useful task for many natural language applications, and is usually performed through the use of a set of manually classified documents, a training collection. We suggest the utilization of additional resources like lexical databases to increase the amount of information that TC systems make use of, and thus, to improve their performance. Our a...
متن کامل